World Data League 2022

🎯 Challenge

Optimization of public transport routes during road interruptions by Cascais Municipality

Team: Moons of Jupyter

👥 Authors

💻 Development

Start coding here! 🐱‍🏍

Create the necessary subsections (e.g. EDA, different experiments, etc..) and markdown cells to include descriptions of your work where you see fit. Comment your code.

All new subsections must start with three hash characters.

Pro-tip 1: Don't forget to make the jury's life easier. Remove any unnecessary prints before submitting the work. Hide any long output cells (from training a model for example). For each subsection, have a quick introduction (justifying what you are about to do) and conclusion (results you got from what you did).

Pro-tip 2: Have many similiar graphs which all tell the same story? Add them to the appendix and show only a couple of examples, with the mention that all the others are in the appendix.

(0) Approach and Methodology

Optimization of public transport in cities is without a question a hot topic. The perfomance of the public transport has direct consequences not only in the quality of life of the citizens of each city (because of connectivity), but also in the overall CO2 emissions of an entire country. It is a big challenge to focus on. To solve a Vehicle Routing Problem (VRP) with the size of an entire city is a NP-Hard problem. This means: Easy to check once it is solved, but very difficult and time consuming to solve it to optimality (if an optimal solution can actually be found). Because of this we decided to go step by step, and even before considering to go with heuristics and other methodologies, our approach and methodology was based on Divide and Conquer. Our idea aims to go from general to specific, and the methodology is shown below.

Methodology.001-2.jpeg

Figure 1. Proposed Methodology

From general to specific, the methodology follows the following:

  1. Caiscais Transit Data:
    • Gather and interpret geospatial data from Caiscais Municipality
    • The data included:
      • MNIVP Dataset: Dataset with public interventions between 2014 and 2021, with a total of 6617 registers
      • EIXODEVIA Dataset: Dataset with the streets of the city in coordinates (Longitude, Latitude)
      • MNCAUTOCARRO Dataset: Bus routes in coordinates (Longitude, Latitude)
      • GTFS Datasets: Bus routes and stops in coordinates (Longitude, Latitude)
      • POI Data: From Google Maps (Manually) POI data was obtained to check for important areas of the city.
  1. Analysis of Street Interventions:
    • From the Caiscais data it was obtained the total number of days each street was off service (at least at one point), total number of interrumptions per street, and other type of description of the phenomenon.
    • Based on this information, one street was chosen based on its perfomance (number of interruptions) and its importance to the city (location).
    • The neighborhood of the chosen city is exported to do a route optimization for a single bus route in case of interruption occurring in this street.
  1. Route Optimization on a Single Street:
    • An Optimization Model for Shortest Path (STP) using Dijsktra's Exact Algorithm is implemented in this part. This implementation considers using the neighborhood selected in part (2) as a network, and the bus route as the subject of optimization.
    • Dijkstra's algorithm for STP stands as the base case, in which the routes is optimized only considering minimization of the new alternative route. Then this algorithm was modified to include rewards by visiting the closest points to the original bus route. Then this algorithm is again modified to include also penalties for deviating too much for its original route.
    • The solutions promise good future scaling for the entire city.

(1) Exploratory Data Analysis

1.1 Data Preprocessing

The following files were preprocessed as described:

  1. eixodevia.geojson:
    • As geojson data, the file needed some corrections to make it redable to geopandas library.
    • In the feature geometry there was one extra set of brackets that the library misunderstood as empty string eventhough the data was given. After deleting the extra (unnecessary) set of brackets the code worked as a charm.
    • For example below was the original data:
      • "geometry": {{"type": "LineString","coordinates": [[-9.3907362281, 38.7355689813],[-9.3912515233, 38.7355664987]]}}
      • In this example there are two curly braces '{}' after geometry which is incorrect.
    • Geopandas need it to be in below format:
      • "geometry": {"type": "LineString","coordinates": [[-9.3907362281, 38.7355689813],[-9.3912515233, 38.7355664987]]}
  1. mnivp.csv:
    • The portuguese characters of the street names were change to standard english, to avoid mis-manipulation.
    • 54 points have end date before start date. We assumed that it was a typo mistake, and inverted them.
    • Interventions without start and end dates were deleted (31 points)
    • Columns 'Temporary reception', 'Final reception' and 'Process number' were deleted for having more than 10% missing values.
    • Interventions without street names were deleted (56 points)

1.2 Caiscais City Network

Caiscais is a city in Portugal (near Lisbon), which has 987 registered different streets as it can be seen below.

Caiscais contains 44 bus routes to give the city connectivity. In total, these 44 bus routes traverse the city through 1043 bus stops. Their routes and stops in the city (what we call Caiscais City Network) can be seen below.

With this exploratory analysis of the public transport of the city of Caiscais, we can have an idea of the size and magnitude of the problem. This confirms our approach. In order to pursue further, we need to select a promising area to prove our hypothesis. To do so, it is necessary to analise the data related to the interruptions/interventions of the streets of Caiscais.

(2) Caiscais Street Interventions Analysis

The data contains all interventions that took place in the streets of Caiscais during the period 2013 - 2021, which in total sum 6516 data points (single interventions). For simplicity, we assumed that any intervention occurring in a street involves the detention of that street. This means that whenever an intervention takes place, the transit through that street halts.

2.1 Frequency of Street Interventions

Let's see the distribution of the off time of the interventions. This means: the distribution of the times each street was off transit.

In average, the streets that have been intervened in Caiscais have been halted for 13.3 days. Concerning the continuity of the public transport as a system, this confirms the need to have resilient alternative mechanism to the preventive and corrective interventions of the streets in the future.

The previous graph tells us that most of the interventions and cuts in the streets occur during January-March, which makes sense considering that in the winter time, less tourist visit the area, so the downsides of interventions are lesser.

2.2 Interventions per Street: The Streets of the Cita as a Color Map

To account for the frequency of interventions per street, we muss work the data and then plot the geospatial data in scale.

There is a red street in the colormap (More than 21 interruptions) that crosses most of the POI data of the City. This street is named: "Avenida 25 de Abril". Let's see the surroundings in a zoom of the colormap.

Based in all the above, the street Avenida 25. de Abril is chosen to perform a local bus route optimization, based on our approach. This decision is supported by:

(3) Optimization on a Single Street: Avenida 25. de Abril

Visualising the Avenida 25. de Abril street and its neighborhood we have the following network.

The area will be represented by an ideal network, where each street intersection will be a node, and the original length of each edge (segment between intersections per street) will be treated as a "cost" which magnitud will be the euclidean distance. Graphically, this is shown below.

This neighborhood is represented virtually by a distance matrix where:

This matrix is shown below.

Implementation of Dijkstra's Aglorithm for finding the shortest path in the street of Cascais.

We removed an edge and our model calculated the shortest distance, which is indicated below,

(3) Conclusion

With the shown approach we have proven that our believe of devide and conquer works. Considering the structure of a network that consist of 'Avenida 25. de Abril' and bus route M43, the Dijkstra's exact algorithm for shortest path problem inclusing rewards to closeness to formal bus stops, and penalties for bus deviations is able to provide an optimal solution. This instance represents the first steps towards a city-scale problem. As this is going to be a NP-Hard problem because of the magnitude and the number of bus-routes to optimize, the next steps should consider the implementation of Heuristics such as [Orinteering algorithm](https://www.cs.cmu.edu/~avrim/Papers/orienteering-sicomp.pdf

Our product is a methology with which we convinced that Cascais Municipality can thrive in preserving the quality of publich transport, by means of Data Science and Operational Research.

🖼️ Visualisations

Copy here the most important visualizations (graphs, charts, maps, images, etc). You can refer to them in the Executive Summary.

Technical note: If not all the visualisations are visible, you can still include them as an image or link - in this case please upload them to your own repository.

👓 References

List all of the external links (even if they are already linked above), such as external datasets, papers, blog posts, code repositories and any other materials.

⏭️ Appendix

Add here any code, images or text that you still find relevant, but that was too long to include in the main report. This section is optional.